Data Resampling for Path Based Clustering
نویسندگان
چکیده
Path Based Clustering assigns two objects to the same cluster if they are connected by a path with high similarity between adjacent objects on the path. In this paper, we propose a fast agglomerative algorithm to minimize the Path Based Clustering cost function. To enhance the reliability of the clustering results a stochastic resampling method is used to generate candidate solutions which are merged to yield empirical assignment probabilities of objects to clusters. The resampling algorithm measures the reliability of the clustering solution and, based on their stability, determines the number of clusters.
منابع مشابه
Bagging for Path-Based Clustering
A resampling scheme for clustering with similarity to bootstrap aggregation (bagging) is presented. Bagging is used to improve the quality of pathbased clustering, a data clustering method that can extract elongated structures from data in a noise robust way. The results of an agglomerative optimization method are influenced by small fluctuations of the input data. To increase the reliability o...
متن کاملResampling-based selective clustering ensembles
Traditional clustering ensembles methods combine all obtained clustering results at hand. However, we observe that it can often achieve a better clustering solution if only part of all available clustering results are combined. This paper proposes a novel clustering ensembles method, termed as resampling-based selective clustering ensembles method. The proposed selective clustering ensembles me...
متن کاملResampling Method for Unsupervised Estimation of Cluster Validity
We introduce a method for validation of results obtained by clustering analysis of data. The method is based on resampling the available data. A figure of merit that measures the stability of clustering solutions against resampling is introduced. Clusters that are stable against resampling give rise to local maxima of this figure of merit. This is presented first for a one-dimensional data set,...
متن کاملResampling for Fuzzy Clustering
Resampling methods are among the best approaches to determine the number of clusters in prototype-based clustering. The core idea is that with the right choice for the number of clusters basically the same cluster structures should be obtained from subsamples of the given data set, while a wrong choice should produce considerably varying cluster structures. In this paper I give a brief overview...
متن کاملResampling Method For UnsupervisedEstimation Of Cluster
We introduce a method for validation of results obtained by clustering analysis of data. The method is based on resampling the available data. A gure of merit that measures the stability of clustering solutions against resampling is introduced. Clusters which are stable against resam-pling give rise to local maxima of this gure of merit. This is presented rst for a one-dimensional data set, for...
متن کامل